Skip to content

Conversation

@jayhemnani9910
Copy link

Purpose
Adds a reusable ETL transformation to normalize timestamps to UTC based on source time zone.

Features

  • Converts timestamp to UTC using the time zone field
  • Stores timestamp_utc and timezone_original
  • Gracefully handles invalid formats and missing values

Real-World Use
Essential for pipelines dealing with global log files, sensors, APIs, or multi-region data warehouses ensures consistency and accurate time-series reporting.

Tests
Included unit tests:

  • Valid time zone normalization
  • Invalid time zone detection
  • Missing timestamp error

All tests passed

@jayhemnani9910 jayhemnani9910 marked this pull request as ready for review April 30, 2025 22:27
original_tz = row[tz_col]

# Parse the timestamp
naive_dt = datetime.fromisoformat(original_ts)

Check warning

Code scanning / Pylint (reported by Codacy)

Class 'datetime' has no 'fromisoformat' member Warning

Class 'datetime' has no 'fromisoformat' member
@coveralls
Copy link

Pull Request Test Coverage Report for Build 14765567445

Details

  • 34 of 35 (97.14%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.02%) to 91.125%

Changes Missing Coverage Covered Lines Changed/Added Lines %
petl/test/transform/test_normalize_timezone.py 18 19 94.74%
Totals Coverage Status
Change from base Build 14278138538: 0.02%
Covered Lines: 13419
Relevant Lines: 14726

💛 - Coveralls

@juarezr juarezr requested review from arturponinski and juarezr May 1, 2025 00:19
@juarezr juarezr added the Feature A nice to have thing that we don't have yet label May 1, 2025
Copy link
Member

@juarezr juarezr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind making some changes to this PR?

@@ -0,0 +1,38 @@
from datetime import datetime
import pytz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, pytz is not a hard requirement when using petl.

Due to this, the CI jobs running on windows are failing with the following error:

=================================== ERRORS ====================================
_______ ERROR collecting petl/test/transform/test_normalize_timezone.py _______
ImportError while importing test module 'D:\a\petl\petl\petl\test\transform\test_normalize_timezone.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\importlib\__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
petl\test\transform\test_normalize_timezone.py:2: in <module>
    from petl.transform.normalize_timezone import normalize_timezone
petl\transform\normalize_timezone.py:2: in <module>
    import pytz
E   ModuleNotFoundError: No module named 'pytz'

Would you mind making the pytz import to be called only when explicitly using this functionality?

{'timestamp': '2023-12-01T10:00:00', 'timezone': 'America/New_York'},
{'timestamp': '2023-12-01T15:00:00', 'timezone': 'Europe/London'}
]
result = list(normalize_timezone(input_data))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like python3.6 doesn't work with this:

  _________________ TestNormalizeTimezone.test_basic_conversion __________________
  
  table = [{'timestamp': '2023-12-01T10:00:00', 'timezone': 'America/New_York'}, {'timestamp': '2023-12-01T15:00:00', 'timezone': 'Europe/London'}]
  timestamp_col = 'timestamp', tz_col = 'timezone'
  
      def normalize_timezone(table, timestamp_col='timestamp', tz_col='timezone'):
          """
          Normalize timestamps to UTC while retaining original timezone.
      
          Args:
              table: petl table (iterable of rows/dicts)
              timestamp_col (str): column name with timestamp strings
              tz_col (str): column name with timezone name (e.g., 'America/New_York')
      
          Yields:
              Each row with two added fields: 'timestamp_utc' and 'timezone_original'
          """
          for row in table:
              try:
                  original_ts = row[timestamp_col]
                  original_tz = row[tz_col]
      
                  # Parse the timestamp
  >               naive_dt = datetime.fromisoformat(original_ts)
  E               AttributeError: type object 'datetime.datetime' has no attribute 'fromisoformat'
  
  petl/transform/normalize_timezone.py:22: AttributeError
  
  During handling of the above exception, another exception occurred:
  
  self = <petl.test.transform.test_normalize_timezone.TestNormalizeTimezone testMethod=test_basic_conversion>
  
      def test_basic_conversion(self):
          input_data = [
              {'timestamp': '2023-12-01T10:00:00', 'timezone': 'America/New_York'},
              {'timestamp': '2023-12-01T15:00:00', 'timezone': 'Europe/London'}
          ]
  >       result = list(normalize_timezone(input_data))
  
  petl/test/transform/test_normalize_timezone.py:11: 
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
...

Can you rework this test to be skipped when python <= 3.6, please?

tz_col (str): column name with timezone name (e.g., 'America/New_York')

Yields:
Each row with two added fields: 'timestamp_utc' and 'timezone_original'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a code example here would be interesting, but not required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature A nice to have thing that we don't have yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants